Towards a Unified Exploitation of Electronic Dialectal Corpora: Problems and Perspectives
نویسندگان
چکیده
In this paper, we deal with the problem of storing and retrieving dialectal data in a unified framework. In particular, we discuss issues concerning the design and implementation of a multimedia database which will contain written and oral data from three Greek dialects in Asia Minor. At first, we describe the overall architecture of a system aiming at providing the user with the possibility to store audio recordings, text transcripts, and other annotations. Then we discuss the possibilities and limitations of a retrieval module aiming at combining different linguistic levels for a unified exploitation of oral and written corpora.
منابع مشابه
Towards An Electronic Analysis of Svan Dialectal Divergences
Dies ist eine Internet-Sonderausgabe des Aufsatzes " Towards An Electronic Analysis of Svan Dialectal Divergences " von Jost Gippert (2000). Sie sollte nicht zitiert werden. Zitate sind der Originalausgabe in Kartveluri memḳvidreoba / Kartvelian Heritage 4, 2000, 134-149 zu entnehmen.
متن کاملComparative Study of the Academic Vocabulary Content of Electronic Engi-neering Corpora, GE Materials and M.S. Entrance Examinations
The importance of vocabulary learning has been underlined in the field of English for Academic Purposes (EAP) because non-English majors who require reading English texts in their fields of study have to expand their English vocabulary knowledge much more efficiently than ordinary ESL/EFL learners. Since academic vocabulary instruction in Iranian universities is realized through the use of Gene...
متن کاملSyntactic Complexity of Russian Unified State Exam Texts in English: A Study on Reliability and Validity
In this study we analyze texts used in Russian Unified State Exam on English language. Texts that formed small research corpora were retrieved from 2 resources: official USE database as a reference point, and popular website used by pupils for USE training “Neznaika” (https://neznaika.pro/). The size of two corpora is balanced: USE has 11934 tokens and “Neznaika” - 11918 tokens. We share Biber’...
متن کاملEvaluating the Use of Corpus-based Instruction in a Language Teacher Education Context: Perspectives from the Users
A recent practice in the study of language on teacher education programmes has been the use of electronic corpora, and we are therefore still at the initial stages of exploring key issues relating to their integration. Despite arguments for and against their adaptation, there is a dearth of evaluative research examining student teachers’ perceptions of learning and teaching through corpus-based...
متن کاملA Multi-Dialect, Multi-Genre Corpus of Informal Written Arabic
This paper presents a multi-dialect, multi-genre, human annotated corpus of dialectal Arabic with data obtained from both online newspaper commentary and Twitter. Most Arabic corpora are small and focus on Modern Standard Arabic (MSA). There has been recent interest, however, in the construction of dialectal Arabic corpora (Zaidan and Callison-Burch, 2011a; Al-Sabbagh and Girju, 2012). This wor...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014